Leave-one-out Word Alignment without Garbage Collector Effects

نویسندگان

  • Xiaolin Wang
  • Masao Utiyama
  • Andrew M. Finch
  • Taro Watanabe
  • Eiichiro Sumita
چکیده

Expectation-maximization algorithms, such as those implemented in GIZA++ pervade the field of unsupervised word alignment. However, these algorithms have a problem of over-fitting, leading to “garbage collector effects,” where rare words tend to be erroneously aligned to untranslated words. This paper proposes a leave-one-out expectationmaximization algorithm for unsupervised word alignment to address this problem. The proposed method excludes information derived from the alignment of a sentence pair from the alignment models used to align it. This prevents erroneous alignments within a sentence pair from supporting themselves. Experimental results on Chinese-English and Japanese-English corpora show that the F1, precision and recall of alignment were consistently increased by 5.0% – 17.2%, and BLEU scores of end-to-end translation were raised by 0.03 – 1.30. The proposed method also outperformed l0-normalized GIZA++ and Kneser-Ney smoothed GIZA++.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conservative Garbage Collection on Distributed Shared Memory Systems

In this paper we present the design and implementationof a conservative garbage collection algorithm for distributed shared memory (DSM) applications that use weakly–typed languages like C or C++, and evaluate its performance. In the absence of language support to identify references, our algorithm constructed a conservative approximation of the set of cross–node references based on local infor...

متن کامل

Designing a Concurrent Hardware Garbage Collector for Small Embedded Systems

Today more and more functionality is packed into all kinds of embedded systems, making high-level languages, such as Java, increasingly attractive as implementation languages. However, certain aspects, essential to high-level languages are much harder to address in a low performance, small embedded system than on a desktop computer. One of these aspects is memory management with garbage collect...

متن کامل

Generational Garbage Collection for Lazy Functional Languages without Temporary Space Leaks

Generational garbage collection is an established method for creating eecient garbage collectors. Even a simple implementation where all nodes that survive one garbage collection are tenured, i.e., moved to an old generation, works well in strict languages. In a lazy language, however, such an implementation can create severe temporary space leaks. The temporary space leaks appear in programs t...

متن کامل

Generational Garbage Collection without Temporary Space Leaks for Lazy Functional Languages

Generational garbage collection is an established method for creating eecient garbage collectors. Even a simple implementation where all nodes that survive one garbage collection are tenured, i.e., moved to an old generation , works well in strict languages. In lazy languages, however, such an implementation can create severe temporary space leaks. The temporary space leaks appear in programs t...

متن کامل

Garbage Collecting Reactive Real-Time Systems

As real-time systems become more complex, the need for more sophisticated runtime kernel features arises. One such feature that substantially lessens the burden of the programmer is automatic memory management, or garbage collection. However, incorporating garbage collection in a real-time kernel is not an easy task. One needs to guarantee, not only that sufficient memory will be reclaimed in o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015